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O ■ Independence screening is a variable selection method that uses 

, a ranking criterion to select significant variables, particularly for sta- 

tistical models with nonpolynomial dimensionality or "large p, small 
n" paradigms when p can be as large as an exponential of the sample 
size n. In this paper we propose a robust rank correlation screening 
(RRCS) method to deal with ultra-high dimensional data. The new 
I procedure is based on the Kendall r correlation coefficient between 

response and predictor variables rather than the Pearson correlation 
of existing methods. The new method has four desirable features com- 
pared with existing independence screening methods. First, the sure 
independence screening property can hold only under the existence of 
^ ' a second order moment of predictor variables, rather than exponential 

tails or alikeness, even when the number of predictor variables grows 

■ as fast as exponentially of the sample size. Second, it can be used 
^ I to deal with semiparametric models such as transformation regres- 

. sion models and single-index models under monotonic constraint to 

' the link function without involving nonparametric estimation even 

(N- 

when there are nonparametric functions in the models. Third, the 

■ procedure can be largely used against outliers and influence points in 
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the observations. Last, the use of indicator functions in rank correla- 
tion screening greatly simplifies the theoretical derivation due to the 
boundedness of the resulting statistics, compared with previous stud- 
ies on variable screening. Simulations are carried out for comparisons 
with existing methods and a real data example is analyzed. 

1. Introduction. With the development of scientific techniques, ultra- 
high dimensional data sets have appeared in diverse areas of the sciences, 
engineering and humanities; Donoho (2000) and Fan and Li (2006) have 
provided comprehensive reviews. To handle statistical problems related to 
high dimensional data, variable/model selection plays an important role in 
establishing working models that include significant variables and exclude 
as many insignificant variables as possible. A very important and popu- 
lar methodology is shrinkage estimation with penalization, with examples 
given of bridge regression [Prank and Friedman (1993), Huang, Horowitz 
and Ma (2008)], LASSO [Tibshirani (1996), van de Geer (2008)], elastic-net 
[Zou and Hastie (2005)], adaptive LASSO [Zou (2006)], SCAD [Fan and 
Li (2001), Fan and Peng (2004), Fan and Lv (2011)] and Dantzig selector 
[Candes and Tao (2007)]. When irrepresentable conditions are assumed, we 
can guarantee selection consistency for LASSO and Dantzig selector even 
for "large p, small n" paradigms with nonpolynomial dimensionality (NP- 
dimensionality) . However, directly applying LASSO or Dantzig selector to 
ultra-high dimensional modeling is not a good choice because the irrep- 
resentable conditions can be rather stringent in high dimensions; see, for 
example, Lv and Fan (2009) and Fan and Lv (2010). 

Fan and Lv (2008) proposed another promising approach called sure in- 
dependence screening (SIS). This methodology has been developed in the 
literature by researchers recently. Fan and Song (2010) extended SIS to ultra- 
high dimensional generalized linear models, and Fan, Feng and Song (2011) 
studied it for ultra-high dimensional additive models. Moreover, based on 
the idea of dimension reduction, Zhu et al. (2011) suggested a model- free 
feature screening method for most generalized parametric or semiparametric 
models. To sufficiently use the correlation information among the predictor 
variables, Wang (2012) proposed a factor profile sure screening method for 
the ultra-high dimensional linear regression model. Different from existing 
methods with penalization, SIS does not use penalties to shrink estima- 
tion, but ranks the importance of predictors by correlations between re- 
sponse and predictors marginally for variable/model selection. To perform 
the ranking, Pearson correlation is adopted; see Fan and Lv (2008). For NP- 
dimensionality, the tails of predictors need to be nonpolynomially light. This 
is also the case for other shrinkage estimation methods such as the LASSO 
and Dantzig selector. Moreover, to use more information among the pre- 
dictor variables to make a sure screening such as Wang (2012), or to apply 
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the sure screening method to more general statistical models such as Zhu 
et al. (2011), more restrictive conditions, such as the normality assumption 
[Wang (2012)] or the linearity and moment conditions [Zhu et al. (2011)], 
need be imposed on the predictor variables. To further improve estimation 
efficiency. Fan and Lv (2008) suggested a two-stage procedure. First, SIS is 
used as a fast but crude method of reducing the ultra-high dimensionality 
to a relatively large scale that is smaller than or equal to the sample size 
n; then, a more sophisticated technique can be applied to perform the final 
variable selection and parameter estimation simultaneously. Note that for 
linear models, the SIS procedure also depends on the explicit relationship 
between the Pearson correlation and the least squares estimator [Fan and 
Lv (2008)]. For generalized linear models, Fan, Samworth and Wu (2009) 
and Fan and Song (2010) selected significant predictors by sorting the corre- 
sponding marginal likelihood estimator or marginal likelihood. That method 
can be viewed as a likelihood ratio screening, as it builds on the increments 
of the log- likelihood. The rate of p also depends on the tails of predictors. 
The lighter the tails are, the faster the rate of p can be. Xu and Zhu (2010) 
also showed for longitudinal data that when only the moment condition is 
assumed, the rate of p cannot exponentially diverge to infinity unless mo- 
ments of all orders exist. 

For other semiparametric models such as transformation models and single- 
index models, existing SIS procedures may involve nonpar ametric plug-in es- 
timation for the unknown transformation or link function. This plug-in may 
deteriorate the estimation/selection efficiency for NP-dimensionality prob- 
lems. Although the innovative sure screening method proposed by Zhu et al. 
(2011) can be applied to more general parametric or semiparametric models, 
as commented above, the much more restrictive conditions are required for 
the predictor variables. Zhu et al. (2011) imposed some requirements for the 
tail of the predictor variables which further satisfy the so-called linearity 
condition. This condition is only slightly weaker than elliptical symmetry of 
the distribution of the predictor vector [Li (1991)]. It is obvious that their 
sure screening method does not have the robust properties as the proposed 
method in this paper has. Further, when the categorial variables do involve 
the ultra-high dimensional predictor vector, the restrictive conditions on the 
predictor variables hinder the model-free feature screening method to apply 
directly. On the other hand, such a model-free feature screening method is 
based on slice inverse regression [SIR, Li (1991)]. It is well known that SIR 
is not workable to the model with symmetric regression function; see Cook 
and Weisberg (1991). 

We note that the idea of SIS is based on Pearson correlation learning. 
However, the Pearson correlation is not robust against heavy tailed distri- 
butions, outliers or influence points, and the nonlinear relationship between 
response and predictors cannot be discovered by the Pearson correlation. As 
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suggested by Hall and Miller (2009) and Huang, Horowitz and Ma (2008), 
independence screening could be conducted with other criteria. For correla- 
tion relationships, there are several measurements in the literature, and the 
Kendall r [Kendall (1938)] is a very commonly used one that is a correlation 
coefficient in a nonparametric sense. Similar to the Pearson correlation, the 
Kendall r also has wide applications in statistics. Kendall (1962) gave an 
overview of its applications in statistics and showed its advantages over the 
Pearson correlation. First, it is robust against heavy tailed distributions: see 
Sen (1968) for parameter estimation in the linear regression model. Second, 
the Kendall r is invariant under monotonic transformation. This property 
allows us to discover the nonlinear relationship between the response and 
predictors. For example, Han (1987) suggested a maximum rank correlation 
estimator (MRC) for the transformation regression model with an unknown 
transformation link function. Third, the Kendall r based estimation is a Li- 
statistic with a bounded kernel function, which provides us a chance to ob- 
tain sure screening properties with only a moment condition. Another rank 
correlation is the Spearman correlation [see, e.g., Wackerly, Mendenhall and 
Scheaffer (2002)]. The Spearman rank correlation coefficient is equivalent 
to the traditional linear correlation coefficient computed on ranks of items 
[Wackerly, Mendenhall and Scheaffer (2002)]. The Kendall r distance be- 
tween two ranked lists is proportional to the number of pairwise adjacent 
swaps needed to convert one ranking into the other. The Spearman rank 
correlation coefficient is the projection of the Kendall r rank correlation to 
linear rank statistics. The Kendall r has become a standard statistic with 
which to compare the correlation between two ranked lists. When various 
methods are proposed to rank items, the Kendall r is often used to measure 
which method is better relative to a "gold standard." The higher the cor- 
relation between the output ranking of a method and the "gold standard," 
the better the method is. Thus, we focus on the Kendall r only. More in- 
terestingly, the Kendall r also has a close relationship with the Pearson 
correlation, particularly when the underlying distribution of two variables is 
a bivariate normal distribution (we will give the details in the next section). 
As such, we can expect that a Kendall r based screening method will benefit 
from the above mentioned advantages to be more robust than the SIS. 

The reminder of this paper is organized as follows. In Section 2 we give the 
details of the robust rank correlation screening method (RRCS) and present 
its extension to ultra- high dimensional transformation regression models. In 
Section 3 the screening properties of the RRCS are studied theoretically for 
linear regression models and transformation regression models. In Section 4 
an iterative RRCS procedure is presented. We also discuss RRCSs applica- 
tion to generalized linear models with NP-dimensionality. Numerical studies 
are reported in Section 5 with a comparison with the SIS. Section 6 con- 
cludes the paper. A real example and the proofs of the main results can be 
found in the supplementary material for the paper [Li et al. (2012)]. 
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2. Robust rank correlation screening (RRCS). 

2.1. Kendall r and its relationship with the Pearson correlation. Con- 
sider the random vectors {Xi,Yi),i = l,2,...,n, and the Kendah r rank 
correlation between Xi and Yi is defined as 

1 " 

(2.1) r = V sgn{Xi - Xj ) sgn{Yi -Yj). 

n(n — 1) ^-^ 

Given this definition, it is easy to know that |r| is invariant against the 
monotonic transformation of Xi or Yi. Furthermore, if {Xi,Yi) foUows a bi- 
variate normal distribution with mean zero and the Pearson correlation p, 
it can be shown that [Huber and Ronchetti (2009)] 

2 

E(t) = — arcsin p. 
vr 

In other words, when (Xi,Yi) follows bivariate normal distribution, the Pear- 
son correlation and Kendall r have a monotonic relationship in the following 
sense. If \p\ > c\ for a given positive constant ci, then there exists a positive 
constant C2 such that |E(t)| > C2, and if and only if p = 0, E(r) = 0. Such 
a relationship helps us to obtain the sure independence screening property 
for linear regression models under the assumption of Fan and Lv (2008) 
without any difficulties when the Kendall r is used. 

When {Xi^Yi) are not bivariate normal but p exists, according to an 
approximation of the Kendall r [Kendall (1949)], using the first fourth-order 
cumulants and the bivariate Gram-Charlier series expansion yield that 
2 

E(r) — arcsin(/j) 
vr 

1 „ 

+ 247r(l -p2)3/2 'i^(^40 + «^04)(3p -2p )- 4(k3i + Kis) + 6pK22}, 

where K40 = /i4o — 3, K31 = psi — 3p, K22 = P22 — 2/5^ — 1. If under some certain 
conditions that K31 and K13 have a monotonic relationship with p and when 
p = 0, K31 = and K13 = 0, intuitively E(r) = approximately when p = 0, 
and if \p\ > ci, then there may exist C2 such that |E(r)| > C2. This means 
that the Kendall' r based method may enjoy similar properties as the SIS 
enjoys without strong conditions. 

2.2. Rank correlation screening. We start our procedure with the linear 
model as 

(2.2) Y = X/3 + £, 

where Y = (li, . . . , Yn)^ is an n- vector of response, X = (Xi, . . . , X^)"^ is an 
n X p random design matrix with independent and identically distributed 
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Xi , . . . , X„ , fi = (/?! , . . . , /3p)^ is a p- vector of parameters and e = (ei , . . . , e„)-^ 
is an n-vector of i.i.d. random errors independent of X. 

To motivate our approach, we briefly review the SIS first. Let 

(2.3) c^ = (wi,...,^^p)^ = X^Y, 

where each cokimn of the nxp design matrix X has been standardized with 
mean zero and variance one. Then, for any given (i„ < n, take the selected 
submodel to be 

-^rfn = {1 — i — P • I'^jl is among the first largest of all}. 

This reduces the full model of size p ^ n to a submodel with the size dn- 
By appropriately choosing all significant predictors can be selected into 
the submodel indexed by with probability tending to 1; see Fan and 

Lv (2008). 

Similar to Li, Peng and Zhu (2011), let u = {oji,oj2, ■ ■ ■ ,ojp)^ be a j5- vector 
each being 

n 

(2.4) uk = -, — -y^i{Xik<x,k)m<Y,)-- k=i,...,p, 

nin — 1) ^-^ 4 

where /(•) denotes the indictor function, and a;^ is the marginal rank cor- 
relation coefficient between Y and X.^, which is equal to a quarter of the 
Kendall r between Y and X.^. As a U-statistic, is easy to compute. We 
can then sort the magnitudes of all the components of u; = (wi, . . . , Wp)"^ in 
a decreasing order and select a submodel 

(2.5) = {^^k<p: \oJk\ is among the first dn largest of all} 
or 

(2.6) M^„={l<k<p:\uJk\>ln}. 

where dn or 7„ is a predefined threshold value. Thus, it shrinks the full 
model indexed {l,...,p} down to a submodel indexed Aidn or Ai^y^ with 
size |A^d„| <n or I^W-y^l <n. Because of the robustness of the Kendall r 
against heavy-tailed distributions, such a screening method is expected to 
be more robust than the SIS. 

Consider a more general model as 

(2.7) H{Yi)=y.Jl3 + eu i = l,...,n, 

where £i,i = 1, . . . ,n, are i.i.d. random errors independent of Xj with mean 
zero and an unknown distribution F, and /3 = (/3i, . . . ,(3p)'^ is a vector of 
parameters, its norm constrained to 1 (||/3|| = 1) for identifiability. H{-) is an 
unspecified strictly increasing function. Model (2.7) has been studied exten- 
sively in the econometric and bioinformatic literature and is commonly used 
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to stabilize the variance of the error and to normalize/symmetrize the error 
distribution. With different forms of H and F, this model generates many 
different parametric families of models. For example, when H takes the form 
of a power function and F follows a normal distribution, model (2.7) reduces 
to the familiar Box-Cox transformation models [Box and Cox (1964), Bickel 
and Doksum (1981)]. If H{y) = y or H{y) = log(y), model (2.7) reduces to 
the additive and multiplicative error models, respectively. More parametric 
transformation models can be found in the work of Carroll and Ruppert 
(1988). 

For model (2.7), the invariance against any strictly increasing transfor- 
mation yields that 



(2.^ 



nin — 1) ^-^ 4 

n 



for /c = 1, . . . ,p. That is, w^, /c = 1, 2, . . . ,p, can still be applicable for the 
model with unknown transformation function. Therefore, the RRCS method 
can also be applied to transformation regression models that establish the 
nonlinear relationship between the response and predictor variables. 

3. Sure screening properties of RRCS. In this section we study the 
sure screening properties of RRCS for the linear regression model (2.2) 
and the transformation regression model (2.7). Without loss of general- 
ity, let (Yi,Xifc), {Y'2,.,X2k) be the independent copies of (F, X^), where 
EF = EXfc = and EY^ = = 1, A: = 1, . . . ,p, and assume that 

is the true sparse model with nonsparsity size s„ = |A^*|, recalling that 
/3 = (/3i, . . . ,/3p)"^ is the true parameter vector. The compliment of 7W* is 

Furthermore, for /c = 1, . . . ,p, let = corr(Xfc,y) for model (2.2) and p*^ = 
coTT{Xk, H{Y)) for model (2.7). Recall the definition of u = {ui, . . . ,iOp}'^ 
in (2.4) for both (2.2) and (2.7). 

The following marginal conditions on the models are needed to ensure the 
sure screening properties of RRCS. 

Marginally symmetric condition and Multi-modal condition: For mod- 
el (2.2): 

(Ml) Denote Ay = Yi—Y2, then the conditional distribution F^yiAX^ (0 
is symmetric about zero when /c G A^^, where AX^ = Xn. — X2k- 
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(M2) Denote Ae^ = Yi-Y2- Pk{Xik - ^2fe) and AXk = Xik - ^2fc, then 
the conditional distribution -FAetlAXi, 

af\AXk) follows a symmetric finite mixture distribution where FQ{t, a^lAXk) 
follows a symmetric unimodal distribution with the conditional variance cJq 
related to AX^ and Fi{t,ai\AXk) is a symmetric distribution function with 
the conditional variance af related to AXf^ when G A^*. ttqa: > tt*, where vr* 
is a given positive constant in (0, 1] for any AX^ and any k € Ai^,. 

For model (2.7): 

(Ml') Denote AH{Y) = H{Yi) - H{Y2), where H{-) is the link function of 
the transformation regression model (2.7), and AX^. = Xi^ — ^2fc- The con- 
ditional distribution i^A_f/{y)|AXfc (*) is symmetric about zero when k G 

(M20 Denote Ae^ = H{Yi) - H{Y2) - pl{Xik - X2k) and AXk = ^ifc - 
X2k, where H(-) is the link function of the transformation regression mod- 
el (2.7), then the conditional distribution F^^^^^Xki^) ~ '^okFo{t,crQ\AXk) + 
(1 — 7rofc)-Fi (t, aflAXk) follows a symmetric finite mixture distribution where 
-Fo(^5 o'ol AXfc) follows a symmetric unimodal distribution with the condi- 
tional variance <Tq related to AX^ and Fi[t,af\AXk) is a symmetric dis- 
tribution function with the conditional variance af related to AX^ when 
k G A^*. TTofc > 71"*, where vr* is a given positive constant in (0, 1] for any AX^ 
and any k £ Ai^. 

Remark 1 . According to the definition and symmetric form of Ay, AX/. 
and Aefc, the marginally symmetric conditions (M2) and (M2') are very mild. 
When vr* is small enough, the distribution is close to Fi which is naturally 
symmetric and has no stringent constraint. 

A special case is that the conditional distribution of e^fc = — PkXik or 
eik = H{Yi) — p^Xik, given Xik (i = 1, . . . , n), is homogeneous (not depending 
on Xik) with a finite number of modes. Actually, when this condition holds, 
the conditional distribution of e^k given Xj^ is identical to the correspond- 
ing unconditional marginal distribution. Note that Ae^ = eik — e2fc- When 
€ik,i = 1,2, follows multimodal distribution F^{t) with no more than K 
modes where K is not related to k and n, such a distribution function 
can be rewritten as a weighted sum of K unimodal distributions Fi{-) as 

K 

F,{t) = Y,TTiFi{t), 
i=l 

where vTj > 0, i = 1, . . . , if, with "^^i vTj = 1. Then it is easy to see that the 
distribution of Ae^ = eik — e2k has the following form: 

K K K K 

i=l j=l i=l i^j 
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where F*j{t),i,j = 1, . . . , K , are the distributions of the differences of two 
independent variables, that is, Zj — Zj where Zj follows the distribution of 
Fi{t) and Zj follows the distribution of Fj{t), respectively. Because Fi(t),i = 
1,. . . ,K, are unimodal distributions, F*-,i = 1, . . . ,K, are then symmetric 
unimodal distributions. Hence, FQ*(t) is a symmetric unimodal distribu- 
tion. It is also easy to see that F**(t) is a symmetric multimodal distribu- 
tion function. On the other hand, vr^ = E£i^*^ > l/^'(Ej=i = 1/^- 
As such, (M2) or (M2') is satisfied. 

Other than the marginally symmetric conditions, we also need the follow- 
ing regularity conditions: 

(CI) As n — )• -|-oo, the dimensionality of X satisfies p = 0{exp{n^)) for 
some S G (0, 1), satisfying 5 -|- 2k < 1 for any k G (0, ^). 

(C2) = ™iiifcGA4, EjXifcl is a positive constant and is free of p. 

(C3) The predictors Xj and the error Sj, i = 1, . . . ,n, are independent of 
one another. 

Remark 2. Condition (CI) guarantees that for the independence screen- 
ing method, we can select significant predictors into a working submodel 
with probability tending to 1. SIS also needs this condition; see Fan and Lv 
(2008) and Fan and Song (2010). Condition (C2) is a mild technical con- 
dition that ensures the sure screening property of the RRCS procedure. It 
is worth mentioning that we do not need to have a uniform bound for all 
EX^^. If the size of Ai^^ goes to infinity with a relatively slow speed, we can 
relax this condition to cm, > cn~^ for some positive constant c and i G (0, 1) 
with a suitable choice of the threshold 7^. Precisely, 7„ can be chosen as 
c'n~'^~'' for some positive constant c' where k satisfies 2k + 2l < 1. From 
Theorem 1 below, we can see that |E(a;fc)| > cn~'^~'' for k G To ensure 
the sure screening properties, (CI) needs to be changed to 5 + 2k + 2i <1. 

Theorem 1. Under the regularity condition (C2) and the marginal sym- 
metric conditions (Ml) and (M2) for model (2.2), we have the following: 

(i) E(a;fc) = if and only if pk = 0. 

(ii) If \pk\ > cin~^ for k G 7W* with a positive constant ci > 0, then there 
exists a positive constant C2 such that mmk^M* \^i^k)\ > C2n~^. 

For model (2.7), replacing conditions (Ml) and (M2) with (Ml') and (M2'), 



then: 



(i') E(a;fc) = if and only if pi = 0. 
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(ii') If \pI.\ > cin for A; G with a positive constant ci > 0, then there 
exists a positive constant C2 such that mmk^^M, \^{^k)\ > C2n~'^. 

Remark 3. As Fan and Song (2010) mentioned, the marginally sym- 
metric condition (Ml) is weaker than the partial orthogonality condition 
assumed by Huang, Horowitz and Ma (2008), that is, {Xj., k G M'i} is inde- 
pendent of {Xk, /c G A^*}, which can lead to the model selection consistency 
for the linear model. Our results, together with the following Theorem 2, 
indicate that under weaker conditions, consistency can also be achieved even 
for transformation regression models. Furthermore, as in the discussion of 
Fan and Song (2010), a necessary condition for the sure screening is that 
the significant predictors X^ with /3k ^ are correlated with the response 
in the sense that p^^O. The result (i) of Theorem 1 also shows that when 
the Kendall r is used, this property can be held, which suggests that the 
insignificant predictors in can be detected from E{uik) at the popula- 
tion level. Result (ii) indicates that under marginally symmetric conditions, 
a suitable threshold 7.„ can entail the sure screening in the sense of 

min |E(a;fc)| > 7„, max |E(a;fc)| = 0. 

Remark 4. As a by-product. Theorem 1 reveals the relationship be- 
tween the Pearson correlation and the Kendall r under general conditions, 
especially the multi-modal conditions (M2) or (M2') which in itself is of in- 
terest. However, either condition (M2) or (M2') is a sufficient condition to 
guarantee that the Kendall r has either the property (ii) or (ii') of Theo- 
rem 1, and then has the sure screening property. As in the discussion in Sec- 
tion 2.1, following the high order hivariate Gram-Charlier series expansion 
to approximate the joint distribution of (Xj,l^), under certain conditions 
such as either the condition or sub-Gaussian tail condition, we could also 
obtain similar results of Theorem 1. It would involve some high order of 
moments or cumulants. However, as shown in Theorem 1, either the multi- 
modal condition (M2) or (M2') is to ensure the robust properties of the 
proposed RRCS, and depicts those properties more clearly. Furthermore, we 
will show in the proposition below that the bivariate normal copula family 
also makes another sufficient condition for the following Theorem 1 to hold. 

Bivariate normal copula family based marginal condition: We give an- 
other sufficient condition for (Xj,l^) for the results of Theorem 1 to hold. 
Consider the bivariate normal copula family which is defined as 

(til , U2) = $0 ($"^ (til ) , (li2 ) ) , < til , U2 < 1 , 

where is a bivariate standard normal distribution function with mean 
zero, variance one and correlation 9, $ is the one-dimensional standard nor- 
mal distribution function. Let T denote the collection of all distribution 
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functions on R. We then define the bivariate distribution family V as 

V = {Ce{Fx{x),FY{y)), {x,y) G R\Fx G J',Fy G T}. 

Copula now is a popular tool to study the dependence among multivariate 
random variables. For details, see Nelsen (2006). The normal copula family 
is an important copula family in practice. Particularly, the bivariate normal 
copula family can be used to approximate most of the distributions of bi- 
variate continuous or discrete random vectors, for example, see Carlo and 
Nelson (1997), Ghosh and Henderson (2003), Pitt, Chan and Kohn (2006) 
and Channouf and L'Ecuyer (2009). 

Based on the results of Klaassen and Wellner (1997) and the monotonic 
relationship between the Kendall r and the Pearson correlation, the multi- 
modality can be replaced by the above copula distribution family. A propo- 
sition is stated below. 

Proposition 1. Under the marginal symmetric condition (Ml) for mod- 
el (2.2), we have the following: 

(i) 'Ei{ujk) = if and only if pk = 0. 

(ii) If \pk\ > cin~'^ with a positive constant ci > and the joint distri- 
bution F{x,y) of (Xfc,y) is in V , for k G then there exists a positive 
constant C2 such that min,tg_/K^ |E(tJfc)| > C2n~'^. 

For model (2.7), replacing condition (Ml) with (Ml'), then: 

(i') 'Ei{uJk) = if and only i/ = 0. 

(ii') // Ip^I > cin~^ with a positive constant ci > and the joint distri- 
bution F{x,y) of {Xk,Y) is in V for k G Ai*, then there exists a positive 
constant C2 such that mini^^ji^^ |E(ct;fc)| > C2n~'*. 

Remark 5. If the joint distribution of {X,Y) is in V with the formula 
F{X,Y) = Ce{Fx{X),FY{Y)), the results of Klaassen and Wellner (1997) 
suggested that \6\ equals the maximum correlation coefficient between X 
and Y. As shown in the proof of the proposition, when we replace p hy 6 
in the proposition, the results continue to hold. Hence, this proposition pro- 
vides a bridge between our method and the generalized correlation proposed 
by Hall and Miller (2009) because, according to their definitions, the general- 
ized correlation coefficient is an approximation of the maximum correlation 
coefficient. 

Sure screening property of RRCS: Based on Theorem 1 or Proposition 1, 
the sure screening property and model selection consistency of RRCS are 
stated in the following results. 

Theorem 2. Under the conditions (C1)-(C3), and the conditions of 
Theorem 1 or Proposition 1 corresponding to either model (2.2) or mod- 
el (2.7), for some < k < 1/2 and C3 > 0, there exists a positive constant 
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C4 > such that 

P( max \ujj - E{ujj)\ > Can"") < p{exp(-C4n^"^'')}. 

Furthermore, by taking 7„ = c^n~'^ with C5 < C2/2, if\pk\ > cin^^ for j € A^*, 
we have 

P(7W* C A?^J > 1 - 2\M^\{e^^{-cm^~^'^)}- 

Remark 6. Theorem 2 shows that RRCS can handle the NP-dimension- 
ahty problem for linear and semiparametric transformation regression mod- 
els. It also permits logp = o(n^~^'^), which is identical to that in Fan and 
Lv (2008) for the linear model and is faster than logp = 0(71^^"^^-*/"^) with 
A = max(a + 4, 3a + 2) for some positive a in Fan and Song (2010) when 
the likelihood ratio screening is used. 

Remark 7. It is obvious when the joint distribution of (Xj",!^) follows 
a multivariate normal distribution, conditions (Ml) and (M2) are automati- 
cally valid. The results of sure screening properties are equivalent to those of 
Fan and Lv (2008) under weaker conditions. This is because of the definition 
of the rank correlation Kendall r and its monotonic relationship with the 
Pearson correlation as in the discussion in Section 2. The Kendall r can be 
regarded as a U-statistic and uses the indicator function as the link function. 
As the indicator function is a bounded function, the exponential U-statistic 
inequality can be used to directly control the tail of the rank correlation 
Kendall r rather than those of Xj and 1^. 

Under the conditions of Proposition 1, following similar steps, the same 
results of Theorem 2 and the following Theorem 3 can be obtained with- 
out any difficulties. Thus, we only present the relevant results without the 
detailed technical proofs. 

The following theorem states that the size of M.'y^ can be controlled by 
the RRCS procedure. 

Theorem 3. Under the conditions (C1)-(C3), and conditions of Theo- 
rem 1 or Proposition 1 for model (2.2), when \pk\ > cin"" for some positive 
constant ci uniformly in k £ A^*, for any -jn = c^n"'^ there exists a constant 
cq> such that 

(3.1) n\M^J < 0{?i2-A^ax(S)}) > 1 -p{exp(-C6ni-2-)}, 

where S = Cov(Xj) and Xj = {Xn,. . . ,Xip). For model (2.7) in addition 
to conditions (C1)-(C3) and the marginal symmetric conditions (Ml') and 
(M2'), when > cin^'^ for some positive constant ci uniformly in k G Ai^ 
and Vav{H{Y)) = 0(1), for 7„ = c^n~'^ there exists a constant cq> such 
that the above inequality (3.1) holds. 
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Remark 8. Compared with Theorem 5 of Fan and Song (2010), the 
conditions of Theorem 3 are much weaker and the obtained inequahties are 
much simpler in form although the rates are similar. The number of selected 
predictors is of the order ||S/3||/7^, which is bounded by 0{n^'*Amax(5^)} 
when Var(i7(y)) = 0(1). Hence, when Aniax(S) = 0{n'^), the size of the 
selected predictors is of the order 0{n'^'^~^'^), which can be smaller than n 
when 2k + t < 1. 

From Theorems 1-3, the rank correlation has sure screening properties 
and model selection consistency. However, it is also obvious that it does 
not sufficiently use all of the information from data, particularly the corre- 
lations of predictors. Hence, as most of the other sure screening methods, 
the rank sure screening can be only regarded as an initial model selection 
reducing the ultra-high dimension down to a dimension smaller than the 
sample size n without losing any important significant predictor variables. 
As the numerical results in Section 5 and the discussion of Fan and Lv (2008) 
show, the correlation of predictors could seriously affect the sure screening 
results, and thus more subtle sure screening methods, such as Iterative Sure 
Independence Screening (ISIS) [Fan and Lv (2008)], are in need. 

4. IRRCS: Iterative robust rank correlation screening. 

4.1. IRRCS. With RRCS, the dimension can be brought down to a value 
smaller than the sample size with a probability tending to one. Thus, we can 
work on a smaller submodel. However, in most situations, RRCS can be only 
regarded as a crude model selection method, and the resulting model may 
still contain many superfluous predictors. It is partly because strong correla- 
tion always exists between predictors when too many predictors are involved 
[see Fan and Lv (2008)], and the basic sure screening methods do not use 
this correlation information. We also face some other issues. First, in model- 
ing high dimensional data, it is often a challenge to determine outliers. High 
dimensionality also increases the likelihood of extreme values of predictors. 
Second, even when the model dimension is smaller than the sample size, 
the design matrix may still be near singular when strong correlation exists 
between predictors. Third, the usual normal or sub-Gaussian distributional 
assumption on predictors/errors is not easy to substantiate. Fourth, it is 
also an unfortunate fact that the RRCS procedure may break down if a pre- 
dictor is marginally unrelated but jointly related with the response, or if 
a predictor is jointly unrelated with the response but has higher marginal 
correlation with the response than some significant predictors. To deal with 
these issues, we develop a robust iterative RRCS (IRRCS) that is motivated 
by the concept of Iterative Sure Independence Screening (ISIS) in Fan and 
Lv (2008). 
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To this end, we first briefly describe a penalized smoothing maximum 
rank correlation estimator (PSMRC) suggested by Lin and Peng (2013). 
This estimation approach is applied to simultaneously further select and 
estimate a final working submodel through working on j3. 

For model (2.7), the monotonicity of H and the independence of X and e 
ensure that 

¥{Yi > yj|Xi,Xj) > F{Yi < yj|X„Xj) whenever Xf/3 > Xj/3. 
Hence, (3 can be estimated by maximizing 

(4.1) GniP) = E^(^« > ^l)I(^If^ > 

n[n I) 

It is easy to see that Gn{P) is another version of the Kendall r between Yi 
and X?"/3. The maximum rank correlation [MRC; Han (1987)] estimator /3„ 
can be applied to estimate (3. When p is fixed, the n^/^-consistency and 
the asymptotic normality of /3„ have been derived. However, because Gn{(3) 
is not a smooth function, the Newton-Raphson algorithm cannot be used 
directly, and the optimization of Gn{f3) requires an intensive search at heavy 
computational cost. We then consider PSMRC as follows. Define 

d 

(4.2) Ln{p) = Snif3)-J2pXnm) 
and 

(4.3) SM = -7-^ Yl ^(^« > - ^jff3/h), 

where $(•) is the standard normal distribution function, a smooth function 
for the purpose of reducing computational burden, /i is a small positive 
constant, and p\{\ • |) is a penalty function of Li type such as that in LASSO, 
SCAD or MCP. It is easy to see if /i ^ 0, ^>((Xi - Xj)^/3//i) ^ /(Xf /3 > 
Xj/3). As Ln{f3) is a smoothing function of /3, traditional optimal methods, 
such as the Newton Raphson algorithm or newly developed LARS [Efron 
et al. (2004)] and LLA [Zou and Li (2008)], can be used to obtain the 
maximizer of Ln(/3) to simultaneously achieve the selection and estimation 
of p. For model (2.2), the problem is easier and we do not repeatedly describe 
the estimation for it. 

Next, we introduce our intuitive idea for the proposed IRRCS for the 
transformation regression model. Such an idea can be also applied to the 
linear model since it is a special transformation regression model. In fact, 
given the i.i.d. sequences Yi and X^/3, i = 1, . . . ,n, define YJ*- = I{Yi < Yj) 
and X* (/3) = I(X.iP < Xj/3). Then the Pearson correlation between Y*- and 
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X*^(/3) is the rank correlation Kendall r between Yi and Xj/3. According to 
the idea of the maximum rank correlation [MRC; Han (1987)] estimator, the 
estimate of (3 for the transformation regression model just maximizes the 
Pearson correlation between Y-*- and or the rank correlation Kendall r 

between Y^ and Xj/3. If we do not care about the norm of /3, the least squares 
estimate of /3 in the linear model just maximizes the Pearson correlation 
between Yi and '^JP- If we regard the transformation model as the following 
special linear model: 

Y*j = X.*j {(3) + Eij , 

where £ij = I{Ei < Ej). Then it is easy to see that MRC for the transfor- 
mation model and the least squares estimate for the linear model are based 
on a similar principle and, hence, the idea of Iterative Sure Independence 
Screening (ISIS) for the linear model in Fan and Lv (2008) can be used 
for the transformation model. Based on this intuitive insight, our proposed 
IRRCS procedure is as follows: 

Step 1. First the RRCS procedure is used to reduce the original dimension 
to a value [n/logn] smaller than n. Then, based on the joint information 
from the [n/logn] predictors that survive after the RRCS, we select a subset 
of di predictors Mi = {Xi-^, . . . ,Xi^^} by a model selection method such 
as the nonconcave penalized M-estimation proposed by Li, Peng and Zhu 
(2011) for model (2.2) and the penalized smoothing maximum correlation 
estimator [Lin and Peng (2013)] for model (2.7). 

Step 2. Let ^i^Mi = (^n > • • • i ^idi di x 1 vector selected in 

step 1, and I = I, . . . ,p — di. 

• For model (2.2), define Y* = Yi — X.fj^^l3j^_^, then the Kendall r values 
for the remaining p — di predictors are calculated as follows: 

n 

where /3mi is a vector estimator of the di nonzero coefficients that are 
estimated by the nonconcave penalized M-estimate method in Li, Peng 
and Zhu (2011). Sort the p — di values of the \loi\ again and select another 
subset of [n/logn] predictors from M. — M.i. 
. Formodel(2.7),defineI(y/,l^*)=/(y„y,)-/(X^_^^^^^ <Xj;^^^^J 

where I{Yi,Yj) = I{Yi < Yj) where estimator of the di nonzero 

coefficients, which are estimated with the penalized smoothing maximum 
correlation estimator of Lin and Peng (2013). Then, compute the Kendall r 
through the remaining p — di predictors as 

1 " 1 

= -, — --Y.i{y*,y;)i{Xu < x,i) - - 
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and sort the p — di values of the | 1 's again and select a subset of [n/ log n] 
predictors as in step 1. 

Step 3. Replace Yi by Y* in (2.2) and I{Y^,Yj) with I{Y*,Y*) in (4.2), 
and select a subset of d2 predictors A^2 = {^n > • • • > -'^ida i from the joint 
information of the [n / log n] predictors that survived in step 2 as in step 1 . 

Step 4. Iterate steps 2 and 3 until k disjoint subsets Aii, . . . ,A4k are 
obtained whose union 

= Ui=i-^^ has a size d less than sample size n. 
In the implementation, we can choose, for example, the largest k such that 
\M\ < n. 

4.2. Discussion on RRCS for generalized linear and single-index models. 
Consider the generalized linear model 

(4.4) /^(y,^) = exp{ye-6(0) + c(y)} 

for known functions 6(-) and c(-) and unknown function 0, where the dis- 
persion parameter is not considered as the mean regression modeled. The 
function is usually called canonical or a natural parameter, and the fol- 
lowing structure of the generalized linear model is often considered: 

(4.5) E(y|X = x) = 6'(0(x)) = (y,^ !3,x^ , 

where x = (xq, . ■ . ,Xp)'^ is a (p+ 1) -dimensional predictor, xq = 1 represents 
the intercept, and 0(x) = J2^=Ql3jXj. In this case, g{-) should be a strictly 
increasing function. Thus, we may use uj of (2.8) with function to rank 
the importance of the predictors. Although the idea seems straightforward, 
the technical details are not easily handled, and we leave them to further 
study. In the simulations, we examine its performance; see the details in 
Section 5. In addition, after reducing the dimension, we consider estimating 
the parameters in the working submodel. Again, we can also see that 

¥{Yi > yj|Xi,Xj) > F{Yi < yj|X„Xj) whenever Xf/3 > Xj/3. 

Hence, Han's (1987) MRC estimator can be used. Fan and Song (2010) 
applied the idea of SIS to (4.4) with NP-dimensionality, and used the maxi- 
mum marginal likelihood estimator (MMLE). They showed that the MMLE 
f3f = if and only if Cov(6'(X'^/3), Xj) = Cov(y, Xj) = 0. That is, MMLE is 
equivalent to the Pearson correlation in a certain sense when SIS is applied. 

A further generalization is with unknown canonical link function g(-). In 
this case, the generalized linear model can be regarded as a special single 
index model with a strictly increasing restriction as the link function &'(•) 
or g{-). Based on the discussion in Section 2, we can also use the Kendall r 
based method to select predictors and PSMRC to estimate the parameters. 
The selection and estimation could be more robust than with the MMLE 
based SIS. 
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5. Numerical studies and application. 

5.1. Simulations. In the first 4 examples, we compare the performance of 
the five methods: SIS, ISIS, RRCS, IRRCS, and the generahzed correlation 
rank method (gcorr) proposed by Hall and Miller (2009) by computing the 
frequencies with which the selected models include all of the variables in the 
true model, that is, their ability to correctly screen unimportant variables. 
The simulation examples cover the linear models used by Fan and Lv (2008), 
the transformation models used by Lin and Peng (2013), the Box-Cox trans- 
formation model used by Hall and Miller (2009), and the generalized linear 
models used by Fan and Song (2010). We also use a "semi-real" example 
as Example 5, in which a part of the data are from a real data set and the 
other part of the data are artificial. The difference from the other examples 
is that this data set contains categorical data. 

Example 1. Consider the following linear model: 
(5.1) Yi=X[f3 + e^, i = l,...,n, 

where /3 = (5, 5, 5, 0, . . . , 0)"^, Xj = {Xu, . . . , Xpi)'^ is a p-dimensional predic- 
tor and the noise £i is independent of the predictors, and is generated from 
three different distributions: the standard normal, the standard normal with 
10% of the outliers following the Cauchy distribution and the standard t dis- 
tribution with three degrees of freedom. The first k = 3 predictors are signif- 
icant, but the others are not. Xj are generated from a multivariate normal 
distribution A^(0, S) with entries of S = {aij)pxp being an = l,i = 1, . . . ,p, 
and aij = p,i j. For some combinations with p = 100, 1000, n = 20, 50, 70 
and p = 0, 0.1, 0.5, 0.9, the experiment is repeated 200 times. 

As different methods may select a working model with different sizes, to 
ensure a fair comparison, we select the same size of n — 1 predictors using the 
four methods. Then we check their selection accuracy in including the true 
model {Xi,X2,X3}. The details of ISIS can be found in Section 4 of Fan and 
Lv (2008). In Table 1, we report the proportions of predictors containing the 
true model selected by RRCS, SIS, IRRCS and ISIS. 

From Table 1, we can draw the following conclusions: 

(1) When noise e is drawn from the standard normal, SIS and ISIS per- 
form better than RRCS and IRRCS according to higher proportions of pre- 
dictors containing the true model selected. The difference becomes smaller 
with a larger sample size and smaller p. ISIS and IRRCS can greatly improve 
the performance of SIS and RRCS. IRRCS can outperform ISIS. 

(2) When p = 0.5 or 0.9, SIS and RRCS perform worse than in the cases 
with p = or 0.1. This coincides with our intuition that high collinearity 
deteriorates the performance of SIS and RRCS. 

(3) It is also worth mentioning that even when there are outliers or the 
heavy-tailed errors, RRCS is not necessarily better than SIS. This is an in- 
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Table 1 

Example 1: the proportion of predictors containing the true model {Xi,X2,X3} selected by RRCS, SIS, IRRCS and ISIS 



iP,n) 


s ~ 




iV(0,l) 






N{0, 1) with 10% outliers 




m 




Method 


p = 


0.1 


0.5 


0.9 







0.1 


0.5 


0.9 





0.1 


0.5 


0.9 


(100,20) 


RRCS 


0.765 


0.745 


0.605 


0.405 


0, 


,840 


0.835 


0.730 


0.640 


0.850 


0.840 


0.765 


0.520 




SIS 


0.835 


0.875 


0.725 


0.650 


0, 


,810 


0.845 


0.705 


0.590 


0.775 


0.805 


0.600 


0.315 




IRRCS 


0.840 


0.905 


0.865 


0.915 


0, 


,995 


0.980 


0.960 


0.895 


0.995 


1 


0.995 


0.930 




ISIS 


1 


1 


0.985 


0.985 


0, 


,885 


0.850 


0.855 


0.845 


0.895 


0.910 


0.865 


0.845 


(100,50) 


RRCS 


1 


1 


1 


0.985 


0, 


,980 


0.960 


0.970 


0.930 


1 


0.995 


0.980 


0.965 




SIS 


1 


1 


1 


1 


0, 


,960 


0.950 


0.970 


0.915 


0.965 


0.970 


0.960 


0.920 




IRRCS 


1 


1 


1 


1 


1 




1 


1 


0.970 


1 


1 


1 


0.990 




ISIS 


1 


1 


1 


1 


0, 


,985 


0.975 


0.975 


0.945 
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1 


0.980 


0.955 
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IRRCS 
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0.480 
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0, 
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0.465 


0.860 


0.895 


0.680 


0.580 




ISIS 


0.835 


0.865 


0.715 


0.530 


0, 


,795 


0.840 


0.650 


0.430 


0.805 


0.855 
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0.460 
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0.825 
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teresting observation. However, when we note the signal-to-noise ratio, we 
may have an answer. Regardless of outliers, model (5.1) has a large signal- 
to-noise ratio by taking the nonzero coefficients /32, t^s) = (5,5,5). This 
means that the impact of the outliers on the results is relatively small and 
RRCS, a nonparametric method, may not be able to show its advantages. 
We have also tried other simulations with smaller signal-to-noise ratios or 
larger percentages of outliers. When data has larger percentages of outliers, 
the performance of RRCS was better than SIS. Especially when iteration 
is used, IRRCS can outperform the corresponding ISIS even in the case 
without outliers. When the data has smaller signal-to-noise ratios, for ex- 
ample, (/?!, /32, /?3, 0, . . . , 0) = (1, 2/3, 1/3, 0, . . . , 0), though the performance 
of SIS and RRCS are comparable and encouraging, all of the results are not 
as good as the results of SIS and RRCS in Table 1. This is reasonable, as 
for all variable selection methods, the phenomenon is the same: when the 
signal-to- noise ratio becomes smaller, selecting significant predictors gets 
more difficult. 

(4) When the data are contaminated with 10% outliers or are generated 
from the t(3) distribution, the IRRCS performs better than the ISIS proce- 
dure because we use the nonconcave penalized M-estimation in the iterative 
step for IRRCS. 

Example 2. Consider Example III in Section 4.2.3 of Fan and Lv (2008) 
with the underlying model, for X = {Xi, . . . , Xp)'^ , 

(5.2) Y = 5Xi + 5X2 + 5X3 - 15^X4 + X5 + e, 

except that Xi , X2 , X^ and noise e are distributed identical to those in Ex- 
ample 1 above. For model (5.2), X4 ~ A^(0, 1) has correlation coefficient 
with all other p — 1 variables, whereas X^ ~ -^(0, 1) is uncorrelated with all 
the other p — 1 variables. X5 has the same proportion of contributions to the 
response as e does, and has an even weaker marginal correlation with Y than 
Xq,. . . , Xp do. We take p = 0.5 for simplicity. We generate 200 data sets for 
this model and report in Table 2 the proportion of RRCS, SIS, IRRCS and 
ISIS that can include the true model. 

The results in Table 2 allow us to draw different conclusions than those 
from Example 1. Even in the case without outliers or the heavy-tailed errors, 
SIS and ISIS are not definitely better than RRCS and IRRCS, respectively, 
whereas in the cases with outliers or heavy-tailed errors there is no exception 
for IRRCS to work well and better than ISIS. However, the small proportions 
of RRCS and SIS show their bad performance. 

Example 3. Consider the following generalized Box-Cox transforma- 
tion model: 



(5.3) 



H(Y,) = X.Jp + ei, z = l,2,...,n, 
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Table 2 

For Example 2: the proportion of RRC'S, SIS, IRRC'S and ISIS that include the true 
model {Xi,X2,X3,X4,X5} (p = 0.5) 



p 


e ~ 




iV(0,l) 




N(0, 1) with 10% outliers 
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Method 


n = 20 


n = 50 


n = 70 


n = 20 


n = 50 


n = 70 n 


= 20 


n = 50 


n = 70 


100 


RRCS 





0.305 


0.595 





0.220 


0.575 





0.305 


0.575 




SIS 





0.285 


0.535 





0.195 


0.525 





0.240 


0.535 




IRRCS 





0.500 


0.820 





0.495 


0.815 





0.530 


0.805 




ISIS 





0.465 


0.855 





0.415 


0.805 





0.405 


0.775 


1000 


RRCS 































SIS 































IRRCS 





0.035 


0.085 





0.030 


0.055 





0.030 


0.085 




ISIS 





0.045 


0.090 





0.015 


0.035 








0.020 



where the transformation functions are unknown. In the simulations, we 
consider the following forms: 

• Box-Cox transformation, - — ■ — ' — , where A = 0.25, 0.5, 0.75; 

• Logarithm transformation function, H(Y) =logy. 

The linear regression model and the logarithm transformation model are 
special cases of the generalized Box-Cox transformation model with A = 1 
and A = 0, respectively. Again, noise £i follows the distributions as those in 
the above examples, /3 = (3, 1.5, 2, 0, . . . , 0)^ and (3/\\P\\ = (0.7682,0.3841, 
0.5121, 0, . . . , O)-'" is a p X 1 vector, and a sample of {Xi , Xp)^ with size n 
is generated from a multivariate normal distribution A^(0, S) whose covari- 
ance matrix S = {aij)pxp has entries an = l,i = 1, . . . ,p, and aij = p,i^ j. 
The replication time is again 200, and p = 100, 1000, n = 20, 50, 70 and 
yO = 0, 0.1, 0.5, 0.9, respectively. We also compare the proposed method with 
the generalized correlation rank method (gcorr) proposed by Hall and Miller 
(2009) for the logarithm transformation model (the results for the Box-Cox 
transformation model are similar). 

From Tables 3 and 4, we can see clearly that without exception RRCS 
outperforms SIS and gcorr significantly and IRRCS can greatly improve the 
performance of RRCS. 

Example 4 (Logistic regression). In this example, the data (X^, Yi), . . . , 
(X.J^jYn) are independent copies of a pair (X-^,y), where the conditional 
distribution of the response Y given X is a binomial distribution with 
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(100,50) 
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Method 


P 


= 


0.1 


0.5 


0.9 







0.1 


0.5 


0.9 







0.1 


0.5 


0.9 


0.75 


SIS 


0, 


.415 


0.470 


0.190 


0.030 


0, 


.380 


0, 


.435 


0, 


.170 


0.005 


0, 


,420 


0.525 


0.355 


0.200 




RRCS 


0, 


.440 


0.525 


0.400 


0.225 


0, 


.430 


0, 


.510 


0, 


.370 


0.220 


0, 


.525 


0.555 


0.450 


0.220 




IRRCS 


0, 


.985 


0.975 


0.975 


0.850 


0, 


.940 


0, 


.910 


0, 


.875 


0.755 


0, 


.960 


0.945 


0.925 


0.840 


0.5 


SIS 


0, 


.320 


0.390 


0.155 


0.005 


0, 


.265 


0, 


.345 


0, 


.160 


0.005 


0, 


.360 


0.490 


0.325 


0.090 




RRCS 


0, 


.435 


0.525 


0.400 


0.225 


0, 


.450 


0, 


.510 


0, 


.390 


0.195 


0, 


.590 


0.545 


0.355 


0.225 




IRRCS 


0, 


.985 


0.970 


0.945 


0.860 


0, 


.900 


0, 


.890 


0, 


.885 


0.745 


0, 


,935 


0.920 


0.910 


0.815 


0.25 


SIS 


0, 


.150 


0.195 


0.090 


0.0025 


0, 


.145 


0, 


.155 


0, 


.085 


0.0015 


0, 


,190 


0.225 


0.175 


0.005 




RRCS 


0, 


.435 


0.535 


0.395 


0.225 


0, 


.425 


0, 


.495 


0, 


.365 


0.220 


0, 


,560 


0.440 


0.385 


0.185 




IRRCS 


0, 


.975 


0.985 


0.960 


0.845 


0, 


.905 


0, 


.885 


0, 


.870 


0.680 


0, 


,910 


0.915 


0.895 


0.785 


0.75 


SIS 


0, 


.935 


0.915 


0.855 


0.415 


0, 


.875 


0, 


.905 


0, 


.795 


0.385 


0, 


,890 


0.910 


0.850 


0.850 




RRCS 


0, 


.965 


0.985 


0.955 


0.890 


0, 


.965 


0, 


.985 


0, 


.945 


0.870 


0. 


,960 


0.985 


0.910 


0.875 




IRRCS 


1 




1 


1 


0.980 


1 




1 




0, 


.965 


0.925 


1 




1 


0.960 


0.910 


0.5 


SIS 


0, 


.935 


0.905 


0.810 


0.390 


0, 


.795 


0, 


.845 


0, 


.740 


0.355 


0, 


,855 


0.890 


0.730 


0.380 




RRCS 


0, 


.965 


0.985 


0.950 


0.890 


0, 


.950 


0, 


.980 


0, 


.950 


0.880 


0. 


,955 


0.940 


0.930 


0.840 




IRRCS 


1 




1 


1 


0.980 


1 




1 




0, 


.955 


0.915 


1 




1 


0.955 


0.930 


0.25 


SIS 


0, 


.815 


0.880 


0.680 


0.305 


0, 


.680 


0, 


.740 


0, 


.585 


0.260 


0, 


,760 


0.860 


0.720 


0.370 




RRCS 


0, 


.965 


0.985 


0.955 


0.900 


0, 


.955 


0, 


.985 


0, 


.955 


0.885 


0, 


,900 


0.985 


0.945 


0.865 




IRRCS 


1 




1 


1 


0.970 


1 




1 




0, 


.975 


0.915 


1 




1 


0.985 


0.910 



?3 



Table 3 

Proportion of SIS, RRCS and IRRCS that include the true model for the Box-Cox transformation model {Xi, X2, X3} 

Ar(0, 1) N{0, 1) with 10% outliers t(3) g 
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Table 3 
( Continued) 



(P,n) 



iV(0,l) 



N{0, 1) with 10% outliers 



t(3) 



A 


Method 


P 


= 


0.1 


0.5 




0.9 







0.1 


0.5 


0.9 







0.1 


0.5 




0.9 


0.75 


SIS 


0. 


,615 


0, 


.605 


0.145 







0, 


.515 


0.490 


0.130 





0, 


,530 


0.570 


0.130 


0, 


.005 




RRCS 


0. 


,750 


0, 


.705 


0.485 


0, 


.230 


0, 


.640 


0.650 


0.435 


0.215 


0, 


.710 


0.640 


0.435 


0, 


.180 




IRRCS 


1 




1 




1 


0, 


.840 


0, 


.940 


0.925 


0.940 


0.780 


0, 


.930 


0.940 


0.935 


0, 


.710 


0.5 


SIS 


0. 


,490 


0, 


.510 


0.110 







0, 


.366 


0.370 


0.080 





0, 


.455 


0.390 


0.150 









RRCS 


0. 


,760 


0, 


.705 


0.465 


0, 


.245 


0, 


.735 


0.655 


0.440 


0.215 


0, 


.745 


0.625 


0.430 


0, 


.170 




IRRCS 


1 




1 




1 


0, 


.815 


0, 


,950 


0.920 


0.930 


0.770 


0, 


,975 


0.965 


0.940 


0, 


.745 


0.25 


SIS 


0. 


,200 


0, 


.215 


0.035 







0, 


.145 


0.160 


0.020 





0, 


.155 


0.210 


0.055 









RRCS 


0. 


,755 


0, 


.695 


0.470 


0, 


.240 


0, 


.675 


0.665 


0.440 


0.215 


0, 


,755 


0.615 


0.375 


0, 


.215 




IRRCS 


1 




1 




1 


0, 


.780 


0, 


.945 


0.930 


0.940 


0.720 


0, 


,955 


0.930 


0.935 


0, 


.725 


0.75 


SIS 


0. 


,860 


0, 


.860 


0.375 


0, 


.005 


0, 


.670 


0.690 


0.270 


0.015 


0, 


.840 


0.865 


0.370 


0, 


.105 




RRCS 


0. 


,880 


0, 


.890 


0.725 


0, 


.515 


0, 


.880 


0.880 


0.695 


0.510 


0, 


,915 


0.885 


0.700 


0, 


.395 




IRRCS 


1 




1 




1 


0, 


.970 


0, 


,960 


0.945 


0.935 


0.910 


0, 


,970 


0.985 


0.930 


0, 


.915 


0.5 


SIS 


0. 


,775 


0, 


.765 


0.275 


0, 


.0015 


0, 


.555 


0.585 


0.230 





0, 


,760 


0.750 


0.280 


0, 


.0015 




RRCS 


0. 


,885 


0, 


.900 


0.715 


0, 


.470 


0, 


.865 


0.875 


0.670 


0.515 


0, 


.915 


0.875 


0.610 


0, 


.440 




IRRCS 


1 




1 




1 


0, 


.950 


0, 


.955 


0.945 


0.935 


0.900 


0, 


,955 


0.950 


0.915 


0, 


.875 


0.25 


SIS 


0. 


,435 


0, 


.445 


0.010 







0, 


.365 


0.290 


0.075 





0, 


,440 


0.440 


0.010 









RRCS 


0. 


,875 


0, 


.880 


0.725 


0, 


.490 


0, 


.830 


0.795 


0.710 


0.500 


0, 


.835 


0.830 


0.655 


0, 


.410 




IRRCS 


1 




1 




1 


0, 


.920 


0, 


.960 


0.940 


0.935 


0.900 


0, 


.955 


0.935 


0.925 


0, 


.885 



(1000,50) 



(1000,70) 
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Table 4 

Proportion of SIS, gcorr, RRCS and IRRCS that include the true model for the logarithm transformation model q 
iV(0, 1) N{0, 1) with 10% outliers t(3) § 
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{P,n) 


Method 


P 


= 


0.1 


0.5 


0.9 







0.1 


0.5 


0.9 







0.1 


0.5 


0.9 


(100,20) 


SIS 


0. 


100 


0, 


.060 


0.070 


0.030 


0, 


.055 


0, 


.065 


0.020 


0.020 


0. 


.040 


0.060 


0, 


.030 


0.015 




gcorr 


0. 


,280 


0, 


.230 


0.105 


0.010 


0, 


.205 


0, 


.215 


0.180 


0.010 


0. 


.185 


0.230 


0, 


.170 


0.015 




RRCS 


0. 


,580 


0, 


.460 


0.385 


0.290 


0, 


.570 


0, 


.410 


0.375 


0.215 


0. 


.575 


0.425 


0, 


.355 


0.170 




IRRCS 


1 




0, 


.975 


0.975 


0.715 


0, 


.875 


0, 


.870 


0.875 


0.560 


0. 


.905 


0.875 


0, 


.840 


0.580 


(100,50) 


SIS 


0. 


,550 


0, 


.650 


0.450 


0.225 


0, 


.470 


0, 


.585 


0.395 


0.250 


0. 


.470 


0.585 


0, 


.455 


0.230 




gcorr 


0. 


,940 


0, 


.925 


0.890 


0.430 


0, 


.855 


0, 


.880 


0.825 


0.385 


0. 


.870 


0.885 


0, 


.860 


0.410 




RRCS 


0. 


,960 


0, 


.985 


0.975 


0.880 


0, 


.960 


0, 


.975 


0.965 


0.930 


0. 


.985 


0.975 


0, 


.945 


0.865 




IRRCS 


1 




1 




1 


0.980 


1 




1 




1 


0.955 


0. 


.990 


1 


1 




0.975 


(1000,50) 


SIS 


0. 


,035 


0, 


.020 


0.005 





0, 


.015 


0, 


.005 


0.020 


0.010 


0. 


.020 


0.010 


0, 


.005 







gcorr 


0. 


,420 


0, 


.415 


0.285 


0.015 


0, 


.385 


0, 


.405 


0.025 


0.005 


0. 


.340 


0.410 


0, 


.265 


0.010 




RRCS 


0. 


,610 


0, 


.670 


0.490 


0.225 


0, 


.630 


0, 


.590 


0.400 


0.200 


0. 


,605 


0.650 


0, 


.495 


0.155 




IRRCS 


1 




1 




1 


0.855 


0, 


.925 


0, 


.900 


0.915 


0.685 


1 




1 


0, 


.990 


0.660 


(1000, 70) 


SIS 


0. 


,125 


0, 


.080 


0.005 





0, 


.075 


0, 


.040 


0.005 





0. 


.080 


0.055 


0, 


.010 


0.005 




gcorr 


0. 


,695 


0, 


.640 


0.615 


0.230 


0, 


.625 


0, 


.630 


0.440 


0.185 


0. 


.590 


0.625 


0, 


.480 


0.205 




RRCS 


0. 


,915 


0, 


.845 


0.785 


0.475 


0, 


.870 


0, 


.880 


0.665 


0.485 


0. 


.860 


0.840 


0, 


.650 


0.450 




IRRCS 


1 




1 




1 


0.940 


1 




1 




0.960 


0.930 


1 




1 


1 




0.925 
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The predictors are generated in the same setting as that of Fan and Song 
(2010), that is, 

_ Ej + ajE 

where e and are i.i.d. standard normal, are i.i.d. and 

foUow a double exponential distribution with location parameter zero and 
scale parameter one, and {ej}^^j2p/3]+i i-i-d. and follow a mixture nor- 
mal distribution with two components iV(— 1, 1), A^(l, 0.5) and equal mixture 
proportion. The predictors are standardized to be mean zero and variance 
one. The constants {oj}^^^ are the same and chosen such that the correla- 
tion p = corr(Xj, Xj) = 0,0.2,0.4,0.6 and 0.8, among the first q predictors, 
and aj = for j > q. Parameter q is also related to the overall correlation in 
the covariance matrix. 

We vary the size of the nonsparse set of coefhcients as s = 3, 6, 12, 15 
and 24, and present the numerical results with g' = 15 and q = 50. Ev- 
ery method is evaluated by summarizing the median minimum model size 
(MMMS) of the selected model and its associated RSD, which is the as- 
sociated interquartile range (IQR) divided by 1.34. The results, based on 
200 replications in each scenario, are recorded in Tables 5-7. The results of 
SIS-based MLR, SIS-based MMLE, LASSO and SCAD in Tables 5-7 are 
cited from Fan and Song (2010). 

From Tables 5-7, we can see that the RRCS procedure does a very rea- 
sonable job similar to the SIS proposed by Fan and Song (2010) in screening 
insignificant predictors, and similarly sometimes outperforms LASSO and 
SCAD for NP-dimensional generalized linear models. 

Example 5 (Logistic regression). This example is based on a real data 
set from Example 11.3 of Albright, Winston and Zappe (1999). This data set 
consists of 208 employees with complete information on 8 recorded variables. 
These variables include employee's annual salary in thousands of dollars 
(Salary); educational level (EduLev), a categorical variable with categories 1 
(finished school), 2 (finished some college courses), 3 (obtained a bachelor's 
degree), 4 (took some graduate courses), 5 (obtained a graduate degree); 
job grade (JobGrade), a categorical variable indicating the current job level, 
the possible levels being 1-6 (6 the highest); year that an employee was 
hired (YrHired); year that an employee was born (YrBorn); a categorical 
variable with values "Female" and "Male" (Gender), 1 for female employee 
and for male employee; number of years of work experience at another 
bank prior to working at the Fifth National Bank (YrsPrior); a dummy 
variable with value 1 if the employee's job is computer related and value 
otherwise (PC Job). Such a data set had been analyzed by Fan and Peng 



Table 5 

The MMMS and associated RSD (in parenthesis) of the simulated examples for logistic regressions when p = 40,000 



p 


n 


SIS-MLR 




SIS-MMLE 


RRCS 


n 


SIS-MLR 




SIS-MMLE 


RRCS 












Setting 


: 1, g=15 














s - 




= (1,1-3,1) 






s — 


D,P = 


- (1,1.3,1,...) 







300 


3 (1) 




3 (1) 


3 (0.74) 


300 


47 (164) 




50 (170) 


56 (188.05) 


0.2 


200 


3 (0) 




3 (0) 


3 (0) 


300 


6 (0) 




6 (0) 


6 (0.74) 


0.4 


200 


3 (0) 




3 (0) 


3 (0) 


300 


7 (1) 




7 (1) 


7 (1.49) 


0.6 


200 


3 (1) 




3 (1) 


3 (0.74) 


300 


8 (1) 




8 (2) 


8 (2.23) 


U.o 




4 (IJ 




4 (1) 


4 (2) 




9 (3) 




9 (3) 


y (Z.Z6) 






s — 


12,(3 


= (1, 1-3, . . .) 






s - 


= 15, /3 


= (1, 1-3, . . .) 







500 


297 (589) 




302.5 (597) 


298 (488) 


600 


350 (607) 




359.5 (612) 


359.5 (657.08) 


0.2 


300 


13 (1) 




13 (1) 


13 (1.49) 


300 


15 (0) 




15 (0) 


15 (0) 


0.4 


300 


14 (1) 




14 (1) 


14 (0.74) 


300 


15 (0) 




15 (0) 


15 (0) 


0.6 


300 


14 (1) 




14 (1) 


14 (1.49) 


300 


15 (0) 




15 (0) 


15 (0) 


0.8 


300 


14 (1) 




1/1 / 1 \ 

14 (1) 


14 (0.74) 

Setting 


300 

:2, g = 50 


15 (0) 




15 (0) 


1 r /n^ 

15 (0) 






s ■- 


= 3,/3 


= (1,1.3,1)^ 






s — 


6,/3 = 


(1,1.3,1,...)^ 







300 


3(1) 




3(1) 


3 (0.74) 


500 


6(1) 




6(1) 


6(2) 


0.2 


300 


3(0) 




3(0) 


3(0) 


500 


6(0) 




6(0) 


6(0) 


0.4 


300 


3(0) 




3(0) 


3(0) 


500 


6(1) 




6(1) 


7 (1.49) 


0.6 


300 


3(1) 




3(1) 


3(1) 


500 


8.5 (4) 




9(5) 


8 (3.73) 


0.8 


300 


5(4) 




5(4) 


5 (3.73) 


500 


13.5 (8) 




14 (8) 


15 (7.46) 






s — 


12, /3 


= (1,1.3,...)^ 






s - 




= (1,1.3,...)^ 







600 


77 (114) 




78.5 (118) 


95 (115) 


800 


46 (82) 




47 (83) 


46 (83.88) 


0.2 


500 


18 (7) 




18 (7) 


19 (6) 


500 


26 (6) 




26 (6) 


27 (8.20) 


0.4 


500 


25 (8) 




25 (10) 


26 (9.70) 


500 


34 (7) 




33 (8) 


33 (8.39) 


0.6 


500 


32 (9) 




31 (8) 


32 (9) 


500 


39 (7) 




38 (7) 


38 (6.71) 


0.8 


500 


36 (8) 




35 (9) 


39 (7.46) 


500 


40 (6) 




42 (7) 


42 (6.15) 
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Table 6 

The MMMS and associated RSD (in parenthesis) of the simulated examples for logistic 
regressions when p = 5000 and q = 15 



p 


n 


SIS-MLR 


SIS-MMLE 


LASSO 


SCAD 


RRCS 








s = 3,/3 


= (1,1.3,1)"^ 











300 


3 (0) 


3(0) 


3(1) 


3(1) 


3 


(0) 


0.2 


300 


3 (0) 


3(0) 


3(0) 


3(0) 


3 


(0) 


0.4 


300 


3 iff] 


3(0) 


3(0) 


3(0) 


3 


(0) 


0.6 


300 


3 (0) 


3(0) 


3(0) 


3(1) 


3 


(0) 


0.8 


300 


3 (1) 


3(1) 


4(1) 


4(1) 


3 


(1.49) 








s = 6,/3 = (1,1.3,1,1.3,1,1.3)^ 









300 


12.5 (15) 


13 (6) 


7(1) 


6(1) 


12 


(24.62 


0.2 


300 


6 (0) 


6(0) 


6(0) 


6(0) 


6 


(0.18) 


0.4 


300 


6 (1) 


6(1) 


6(1) 


6(0) 


7 


(1.49) 


0.6 


300 


7(2) 


7(2) 


7(1) 


6 (1) 


8 


(1.49) 


0.8 


300 


9(2) 


9(3) 


27.5 (3725) 


6(0) 


9 


(2.23) 








s = 12,/3 


= (1,1.3,...)^ 











300 


297.5 (359) 


300 (361) 


72.5 (3704) 


12 (0) 


345 


(522) 


0.2 


300 


13 (1) 


13 (1) 


12 (1) 


12 (0) 


13 


(1.49) 


0.4 


300 


14(1) 


14 (1) 


14 (1861) 


13 (1865) 


14 


(0.74) 


0.6 


300 


14(1) 


14 (1) 


2552 (85) 


12 (3721) 


14 


(1) 


0.8 


300 


14 (1) 


14 (1) 


2556 (10) 


12 (3722) 


14 


(0.74) 








s = 15,/3 


= (3,4,...)^ 











300 


479 (622) 


482 (615) 


69.5 (68) 


15 (0) 


629.5 


(821) 


0.2 


300 


15 (0) 


15 (0) 


16 (13) 


15 (0) 


15 


(0) 


0.4 


300 


15 (0) 


15 (0) 


38 (3719) 


15 (3720) 


15 


(0) 


0.6 


300 


15 (0) 


15 (0) 


2555 (87) 


15 (1472) 


15 


(0) 


0.8 


300 


15 (0) 


15 (0) 


2552 (8) 


15 (1322) 


15 


(0) 



(2004) throughout the fohowing hnear model: 

4 5 

Salary = /3o + /3i Female + /32PCJob + ^ /32+iEdui + ^ /Se+i JobGrdj 
(5.5) 

+ /3i2YrsExp + AsAge + e, 

where the variable YrsExp is total years of working experience, computed 
from the variables YrHired and YrsPrior. Fan and Peng (2004) deleted the 
samples with age over 60 or working experience over 30 and used only 199 
samples to fit model (5.5). The SCAD-penalized least squares coefficient 
estimator of (5.5) is 

/3o = (/3o,/3i,...,/3i3)^ 

= (55.835, -0.624, 4.151, 0, -1.073, -0.914, 0, -24.643, 
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Table 7 

The MMMS and associated RSD (in parenthesis) of the simulated examples for logistic 
regressions when p = 2000 and g = 50 



p 


n 


SIS-MLR 


SIS-MMLE 


LASSO 


SCAD 


RRCS 








s — 3,p 


— (,3,4,cij 











200 


3 (0) 


3 (0) 


3 (0) 


3 (0) 


3 


(0) 


0.2 


200 


3 (0) 


3 (0) 


3 (0) 


3 (0) 


3 


(0) 


0.4 


200 


3 CO) 


3 fO) 


3 (0) 


3 fl) 


3 


(0) 


0.6 


200 


3 (1) 


3 (1) 


3 (1) 


3 (1) 


3 


(0.74) 


0.8 


200 


5 (5) 


5.5 (5) 


6 (4) 


6 (4) 


4 


(2.4) 








s — b,p - (3, 


o,o, o,o, 


'V 









200 


8 (6) 


9 (7) 


7(1) 


7(1) 


8 


(5.97) 


n 9 






on C^Q^ 


Q (A\ 

y (1) 


y (Z) 


14 


(28.54) 


0.4 


200 


51 (77) 


64.5 (76) 


20 (10) 


16.5 (6) 


72 


(76.60) 


0.6 


300 


77.5 (139) 


77.5 (132) 


20 (13) 


19 (9) 


84.5 


(122.94) 


0.8 


400 


306.5 (347) 


313 (336) 


86 (40) 


70.5 (35) 


249.5 


(324.62) 








s= 12,/3 


= (3,4,...)^ 











600 


13 (6) 


13 (7) 


12 (0) 


12 (0) 


13 


(3.90) 


0.2 


600 


19 (6) 


19 (6) 


13 (1) 


13 (2) 


16.5 


(4) 


0.4 


600 


32 (10) 


30 (10) 


18 (3) 


17 (4) 


23 


(7) 


0.6 


600 


38 (9) 


38 (10) 


22 (3) 


22 (4) 


29 


(8.95) 


0.8 


600 


38 (7) 


39 (8) 


1071 (6) 


1042 (34) 


35 


(8) 








s = 24,/3 


= (3,4,...)^ 











600 


180 (240) 


182 (238) 


35 (9) 


31 (10) 


190.5 


(240.48) 


0.2 


600 


45 (4) 


45 (4) 


35 (27) 


32 (24) 


40 


(5) 


0.4 


600 


46 (3) 


47 (2) 


1099 (17) 


1093 (1456) 


45 


(4.40) 


0.6 


600 


48 (2) 


48 (2) 


1078 (5) 


1065 (23) 


47 


(3) 


0.8 


600 


48 (1) 


48 (1) 


1072 (4) 


1067 (13) 


47 


(2.98) 



-22.818, -18.803,-13.859, -7.770, 0.193, 0)^ . 

For this data set, we consider a larger artificial model as a full model with 
additional predictors: 

13 [2p/5] p 

i=l i=14 [2p/5]+l 

where we set (/3o, • • • , /^is)'^ = that is identical to that of (5.5) above by 
Fan and Peng (2004), and set /3j = 0, for i with 13 < i <p. Hence, X^^j^X^j, 
Xi^j and Xjj, 13 < i < p, are insignificant covariates, whose corresponding 
coefficients are zero. The data are generated as follows. {Xij, . . . ,Xi^j,j = 
l,...,n) are corresponding to the covariates in (5.5) and resampled from 
those 199 real data without replacement. For each i, Xjj,14 <i< [2p/5], 
are generated independently from the Bernoulli distribution with success 
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probability pi where p* is independently random sampled from the uniform 
distribution under the interval [0.2,0.8], and Xij, [2p/5] + 1 < i <p, are gen- 
erated independently from the standard normal distribution. Further, the 
noises ffj, 1 < j < n, are, respectively, generated from the normal distribution 
with zero mean and the standard error a = 1,2, 3. 

To compare the performance of different methods, we set the sample 
size n to be 180, and, respectively, consider the different dimensions p = 
200, 400, 600 and 1000. Consider the different sizes of dn = 15, 30, 60, 120 and 
179 predictors for the sure screening by the three different methods: RRCS, 
SIS and the generalized correlation rank method (gcorr) proposed by Hall 
and Miller (2009). Then we compute the proportion of the models that in- 
clude the true one, which are selected by RRCS, SIS and gcorr, respectively. 
The experiment is repeated 200 times and the results are reported in Table 8 
for various combinations of p and dn- 

From Table 8, we can see that the RRCS procedure works well in screening 
out insignificant predictors when there are the categorical covariates. In 
contrast, the SIS and gcorr methods almost cannot choose the true model. 
In most of the repeated experiments, we find that there are always one or 
two significant predictors not being selected by the SIS and gcorr methods 
even when dn = n — 1 = 179 predictors are selected. 

For SIS, such a result is consistent with the numerical study of Example 2 
in Fan, Feng and Song (2011). With complex correlation structure among 
predictors and the response, SIS cannot work well. As for the generalized 
correlation screening method, its computation is complicated, especially be- 
cause it has to use different methods to, respectively, calculate the general- 
ized coefficients between the response and both categorial and continuous 
predictors. The variation of those coefficient estimations would be different, 
and make that the final sure screening results are not as stable as RRCS 
and SIS are. 

5.2. Application to cardiomyopathy microarray data. Please see the sup- 
plementary material for the paper [Li et al. (2012)]. 

6. Concluding remarks. This paper studies the sure screening properties 
of robust rank correlation screening (RRCS) for ultra-high dimensional lin- 
ear regression models and transformation regression models. The method is 
based on the Kendall r rank correlation, which is a robust correlation mea- 
surement between two random variables and is invariant to strictly mono- 
tonic transformation. Our results discover the relationship between the Pear- 
son correlation and the Kendall r rank correlation under certain conditions. 
It suggests that the Kendall r rank correlation can be used to replace the 
Pearson correlation such that the sure screening is applicable not only to lin- 
ear regression models but also to more general nonlinear regression models. 

In both the theoretical analysis and the numerical study, RRCS has been 
shown to be capable of reducing the exponentially growing dimensionality 



Table 8 

For Example 5: the proportion of RRCS, SIS and gcorr that include the true model td 
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1 
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0.985 
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0.045 
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0.005 
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of the model to a value smaller than the sample size. It is also robust against 
the error distribution. An iterative RRCS (IRRCS) has been also proposed 
to enhance the performance of RRCS for more complicated ultra-high di- 
mensional data. 

Some issues deserve further study. From Fan and Song (2010), it is easy 
to know that the sure screening properties of MMLE for generalized linear 
models really depend on Cov{Xk,Y),i = l,2,...,n. Hence, it is an inter- 
esting problem to determine whether the relationship between the Pearson 
correlation and the Kendall r rank correlation can be identified for gener- 
alized linear models. If this can be done, the sure screening properties of 
RRCS for generalized linear models can also be studied theoretically. Note 
that the conditions required are much weaker than SIS needs. Thus, it would 
be of interest to determine whether robust LASSO, SCAD or other penalized 
methods can be defined when the idea described herein is applied. 

APPENDIX: PROOFS OF THEOREMS 
Please see the supplementary material for the paper [Li et al. (2012)]. 
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SUPPLEMENTARY MATERIAL 

Supplement to "Robust rank correlation based screening" 

(DOI: 10.1214/12-AOS1024SUPP; .pdf). Application to Cardiomyopathy 
microarray Data and the proofs of Theorems 1-3 and Proposition 1 require 
some technical and lengthy arguments that we develop in this supplement. 
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